Skip to content

Tutorial for benchmarking #2499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jul 10, 2025
Merged

Tutorial for benchmarking #2499

merged 13 commits into from
Jul 10, 2025

Conversation

jainapurva
Copy link
Contributor

@jainapurva jainapurva commented Jul 7, 2025

This pull request introduces comprehensive documentation updates for the TorchAO benchmarking framework

  • docs/source/benchmarking_overview.md: Added a detailed tutorial on using the TorchAO benchmarking framework. It includes steps for adding APIs, model architectures, and CI dashboard integration, along with troubleshooting tips and best practices.

  • docs/source/benchmarking_user_faq.md: Introduced a new FAQ section to address common benchmarking use cases, with a placeholder for future content.

Copy link

pytorch-bot bot commented Jul 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2499

Note: Links to docs will display an error until the docs builds have been completed.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 7, 2025
@jainapurva jainapurva marked this pull request as ready for review July 7, 2025 23:56
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new microbenchmarking tutorial for the TorchAO framework and integrates it into the documentation index.

  • Adds a comprehensive microbenchmarking.rst tutorial covering API/model integration, local benchmarking, and CI dashboard setup.
  • Updates index.rst to include the new microbenchmarking tutorial in the toctree.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
docs/source/microbenchmarking.rst New tutorial for using the TorchAO benchmarking framework
docs/source/index.rst Added microbenchmarking entry to the documentation index
Comments suppressed due to low confidence (3)

docs/source/microbenchmarking.rst:6

  • The :ref: links to sections are missing corresponding label targets. Please add explicit rst labels (e.g. .. _add-api-to-benchmarking-recipes:) before each section heading so these references resolve correctly.
1. :ref:`Add an API to benchmarking recipes`

docs/source/microbenchmarking.rst:148

  • [nitpick] Grammar suggestion: change to "The output generated after running the benchmarking script is in the form of a CSV file." to improve readability and accuracy.
The output generated after running the benchmarking script, is the form of a csv. It'll contain the following:

docs/source/microbenchmarking.rst:38

  • There's a typo: “it-width” should be “bit-width” and consider rephrasing "appended to the string config in input" to improve clarity.
  If the ``AOBaseConfig`` uses input parameters, like bit-width, group-size etc, you can pass them appended to the string config in input

Comment on lines 21 to 24
quantization
sparsity
contributor_guide
microbenchmarking
Copy link
Preview

Copilot AI Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Consider alphabetizing the toctree entries (e.g., contributor_guide, microbenchmarking, quantization, sparsity) to keep navigation consistent.

Suggested change
quantization
sparsity
contributor_guide
microbenchmarking
contributor_guide
microbenchmarking
quantization
sparsity

Copilot uses AI. Check for mistakes.

@jainapurva jainapurva added topic: documentation Use this tag if this PR adds or improves documentation topic: for developers Use this tag if this PR is mainly developer facing labels Jul 7, 2025
If the ``AOBaseConfig`` uses input parameters, like bit-width, group-size etc, you can pass them appended to the string config in input
For example, for ``GemliteUIntXWeightOnlyConfig`` we can pass it-width and group-size as ``gemlitewo-<bit_width>-<group_size>``

2. Add a Model to Benchmarking Recipes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for this one can people add huggingface models easily?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By adding model, if we mean adding architecture, or custom shapes for architectures, then they can. This is a microbenchmarking recipe, hence to add a model, we'll need to add it's full architecture (like llama), for generating lower level benchmarking numbers. For micro-benchmarking, it doesn't support specifying a hf-model name and importing it. That functionality can be included in future developments.


This tutorial will guide you through using the TorchAO microbenchmarking framework. The tutorial contains different use cases for benchmarking your API and integrating with the dashboard.

1. :ref:`Add an API to benchmarking recipes`
Copy link
Contributor

@jerryzh168 jerryzh168 Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the order should be from most used to least used, also the doc feels more like a developer facing "API reference" which is centers around how to develop the benchmarking tool further (add new things etc.), instead of "using" the benchmarking tool.

Wondering if this could be structured in a way that is more user centered, that is structure this in terms of use cases / user flows:

  • what is the most common use case, and what should be the flow / interaction with the tool?

and introduce what people need to do in different use cases.

one use case I have, is that I added a new tensor, e.g. for fbgemm, I'd like to know how fbgemm quant compare against existing quant on existing microbenchmarks or some models

e.g. how can the benchmarking tool simplify the changes needed to generate these tables: #2273 and #2276

Another could be for kernel developers who is writing cutlass or triton kernels, what are the pieces they need to interact with.

And we can then optimize / simplify each flow after we have that.

But maybe I'm talking about a different doc, this doc can be useful as a "API reference" as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jerryzh168, this is a very helpful suggestion. There should be two docs, one for API reference, another should be for different use-cases, something like an FAQ or help doc

Copy link
Contributor

@drisspg drisspg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor overall nit is that I think we should write docs in markdown as often as possible since it is much more of a lingua franca

2. On the right sidebar, find the "Labels" section.
3. Click on the "Labels" dropdown and select "ciflow/benchmark" from the list of available labels.

Adding this label will automatically trigger the benchmarking CI workflow for your pull request.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add where can we see the results as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also will it run after each new commit is added or if the commit is updated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we add label, it'll run for every commit we add to the PR

4. In the dropdown menu, select the branch.
5. Click the "Run workflow" button to start the benchmarking process.

This will execute the benchmarking workflow on the specified branch, allowing you to evaluate the performance of your changes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens when people:

(1). push a new commit
(2). update a existing commit

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also when do we use (1) and when do we use (2)? seems like they are doing the same thing, if so maybe just keep one is enough

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will run only for the latest change on the branch, it won't trigger automatically on every commit, for that we'll need to add label to the PR

Copy link
Contributor Author

@jainapurva jainapurva Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in PR: #2512

Comment on lines 120 to 127
### Interpreting Results

The benchmark results include:

- **Speedup**: Performance improvement compared to baseline (bfloat16)
- **Memory Usage**: Peak memory consumption during inference
- **Latency**: Time taken for inference operations
- **Profiling Data**: Detailed performance traces (when enabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there some duplicationg between these and L80-L84

- **Latency**: Time taken for inference operations
- **Profiling Data**: Detailed performance traces (when enabled)

Results are saved in CSV format with columns for:
Copy link
Contributor

@jerryzh168 jerryzh168 Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think you can just show a small example output here


This guide is intended to provide instructions for the most fequent benchmarking use-case. If you have any use-case that is not answered here, please create an issue here: [TorchAO Issues](https://github.com/pytorch/ao/issues)

## Table of Contents
Copy link
Contributor

@jerryzh168 jerryzh168 Jul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is closer to use case but still not really end use cases yet I feel, I think it might be helpful to describe scenarios like:
(1) integrating new quantization techniques (e.g. a new float8 technique)
people might be interested to understand overall performance, accuracy, memory footprint impact of the new technique and how it compared to the existing ones
In terms of commits, only last commit is important here I think

(2) kernel optimizations
people might be interested only on performance, and maybe trace is also important
they may put up a PR and will go through multiple commits, or just update a single commit multiple times, and they want to understand the performance differences between different commits

(3) performance regression tracking
what is the entry point for this one? will people receive an email when some threshold is passed?

(4) end users
is there a dashboard we can show end users for them to understand if they are using torchao, what speedup and accuracy drop they can expect on certain technique, device, for certain model architecture, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be addressed in PR #2512

@jainapurva
Copy link
Contributor Author

jainapurva commented Jul 9, 2025

@jerryzh168 As discussed offline, I'll address your comments in the follow-up PR for the end-user tutorial #2512

@@ -0,0 +1,215 @@
# Benchmarking Overview
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also API Guide might be more accurate

@jainapurva jainapurva merged commit 64c1ce3 into main Jul 10, 2025
19 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: documentation Use this tag if this PR adds or improves documentation topic: for developers Use this tag if this PR is mainly developer facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants